PR to apply for E2E OLS evaluation framework for AAP chatbot #47

justjais · 2025-01-21T13:16:17Z

Description

PR to apply for E2E OLS evaluation framework for AAP chatbot

Type of change

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Create a virtual environment, and install the necessary package via make install-deps, for this user need to clone ansible-chatbot-service repo.
For running the E2E tests locally, user need to configure olsconfig.yaml and copy to parent directory, for AAP chatbot scenario sample olsconfig.yaml is under /scripts/evaluation/ folder.
Once configured, user can run the evaluation framework over AAP complete set of QnA as under aap_doc_qna.parquet under /scripts/evaluation/eval_data/ folder, or else run over sample QnA defined under scripts/evaluation/eval_data/aap-sample.parquet file.
To run the eval framework, run following cmd:

OPENAI_API_KEY=IGNORED python -m scripts.evaluation.driver --qna_pool_file /Users/sjaiswal/Sumit/wisdom/ansible-chatbot-service/scripts/evaluation/eval_data/aap-sample.parquet --eval_provider_model_id my_rhoai+granite3-8b --eval_metrics answer_relevancy answer_similarity_llm cos_score rougeL_precision --eval_modes ols_rag --judge_model granite3-8b --judge_provider my_rhoai --eval_query_ids qna1 qna2 qna3 qna4 qna5

TamiTakamiya · 2025-01-22T13:00:05Z

@justjais I have rebased ansible-chatbot-service to the latest upstream (road-core/service). Would you rebase to the current main branch? Sorry for causing extra work.

TamiTakamiya

@justjais I think we want to update those evaluation-related files whenever OLS updates their files. For that purpose, would you add following changes?

Copy all files under scripts/evaluation in the OLS repo even if they are not used. I think it is easier for us to have them when we import changes in the OLS repo.
If any additional files are required, give file names that include aap-
Create scripts/evaluation/README-aap.md to document what was changed/added from/to the OLS code.
Also document how to run the tool for Ansible chatbot in the same scripts/evaluation/README-aap.md file.

scripts/evaluation/olsconfig.yaml

scripts/evaluation/utils/relevancy_score.py

scripts/evaluation/olsconfig.yaml

justjais changed the title ~~<WIP DNM>PR to apply for E2E OLS evaluation framework for AAP chatbot~~ PR to apply for E2E OLS evaluation framework for AAP chatbot Jan 21, 2025

justjais marked this pull request as ready for review January 21, 2025 18:22

justjais requested review from TamiTakamiya, romartin and jameswnl January 21, 2025 18:22

TamiTakamiya force-pushed the main branch from a91dc98 to b3357b1 Compare January 21, 2025 22:24

justjais force-pushed the aap_38439 branch from 280efd8 to a84880b Compare January 22, 2025 13:19

TamiTakamiya requested changes Jan 22, 2025

View reviewed changes

TamiTakamiya force-pushed the main branch from b3357b1 to f956161 Compare January 23, 2025 15:30

justjais force-pushed the aap_38439 branch from a84880b to e6e9d85 Compare January 24, 2025 13:52

romartin reviewed Jan 24, 2025

View reviewed changes

scripts/evaluation/olsconfig.yaml Show resolved Hide resolved

scripts/evaluation/utils/relevancy_score.py Show resolved Hide resolved

scripts/evaluation/olsconfig.yaml Show resolved Hide resolved

scripts/evaluation/olsconfig.yaml Show resolved Hide resolved

TamiTakamiya force-pushed the main branch 2 times, most recently from f26a7a1 to 8d092ac Compare January 30, 2025 17:18

justjais added 5 commits January 31, 2025 11:50

code changes for aap_38439 from ols

3179e6c

fix CI

ee64430

update relevancy if cond

0a8cac0

fix files

c562b2f

fix readme

e346349

justjais force-pushed the aap_38439 branch from 5db5fdb to e346349 Compare January 31, 2025 06:45

romartin approved these changes Jan 31, 2025

View reviewed changes

justjais requested a review from TamiTakamiya January 31, 2025 09:48

TamiTakamiya approved these changes Jan 31, 2025

View reviewed changes

justjais merged commit 265d1c6 into main Jan 31, 2025
24 checks passed

justjais deleted the aap_38439 branch January 31, 2025 13:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR to apply for E2E OLS evaluation framework for AAP chatbot #47

PR to apply for E2E OLS evaluation framework for AAP chatbot #47

justjais commented Jan 21, 2025

TamiTakamiya commented Jan 22, 2025

TamiTakamiya left a comment

PR to apply for E2E OLS evaluation framework for AAP chatbot #47

PR to apply for E2E OLS evaluation framework for AAP chatbot #47

Conversation

justjais commented Jan 21, 2025

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

TamiTakamiya commented Jan 22, 2025

TamiTakamiya left a comment

Choose a reason for hiding this comment